Comparing Sanskrit Texts for Critical Editions: The Sequences Move Problem

نویسندگان

  • Nicolas Béchet
  • Marc Csernel
چکیده

A critical edition takes into account various versions of the same text in order to show the differences between two distinct versions, in terms of words that have been missing, changed, omitted or displaced. Traditionally, Sanskrit is written without spaces between words, and the word order can be changed without altering the meaning of a sentence. This paper describes the characteristics which make Sanskrit text comparisons a specific matter. It presents two different methods for comparing Sanskrit texts, which can be used to develop a computer assisted critical edition. The first one method uses the L.C.S., while the second one uses the global alignment algorithm. Comparing them, we see that the second method provides better results, but that neither of these methods can detect when a word or a sentence fragment has been moved. We then present a method based on N-gram that can detect such a movement when it is not too far from its original location. We show how the method behaves on several examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Sanskrit Texts for Critical Editions

Traditionally Sanskrit is written without blank, sentences can make thousands of characters without any separation. A critical edition takes into account all the different known versions of the same text in order to show the differences between any two distinct versions, in term of words missing, changed or omitted. This paper describes the Sanskrit characteristics that make text comparisons di...

متن کامل

Citation Matching in Sanskrit Corpora Using Local Alignment

Citation matching is the problem of finding which citation occurs in a given textual corpus. Most existing citation matching work is done on scientific literature. The goal of this paper is to present methods for performing citation matching on Sanskrit texts. Exact matching and approximate matching are the two methods for performing citation matching. The exact matching method checks for exact...

متن کامل

Critical Edition of Sanskrit Texts

A critical edition takes into account all the different known versions of the same text in order to show the differences between any two distinct versions. The construction of a critical edition is a long and, sometimes, tedious work. Some software that help the philologist in such a task have been available for a long time for the European languages. However, such software does not exist yet f...

متن کامل

Digital Documentary Editions and the Others

It is a truth universally acknowledged that documentary editions have found a very welcoming home in cyberspace. Documentary editing has often been considered a lower form of scholarship, as suggested by its being commonly called "noncritical editing," a name that barely hides the conviction of its being a non-or prescholarly endeavor. 1 However, this allegedly humble form of editing has now ta...

متن کامل

Computational Algorithms Based on the Paninian System to Process Euphonic Conjunctions for Word Searches

Searching for words in Sanskrit E-text is a problem that is accompanied by complexities introduced by features of Sanskrit such as euphonic conjunctions or ‘sandhis’. A word could occur in an E-text in a transformed form owing to the operation of rules of sandhi. Simple word search would not yield these transformed forms of the word. Further, there is no search engine in the literature that can...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Polibits

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2012